PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing
نویسندگان
چکیده
Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.
منابع مشابه
pNear: combining Content Clustering and Distributed Hash Tables
Full-text search is a challenging problem in Peer-to-Peer (P2P) systems. Currently two promising directions to solve this problem are (1) distributed indexes like hash-tables (DHTs) and (2) semantic overlay networks (SONs) which can be divided into systems that cluster peers with similar content based on term overlap and systems that map both the content and queries on a shared semantic data st...
متن کاملScalable Range Query Processing for Large-Scale Distributed Database Applications
Peer-to-peer (P2P) systems provide a robust, scalable and decentralized way to share and publish data. Although highly efficient, current P2P index structures based on Distributed Hash Tables (DHTs) provide only exact match data lookups. This compromises their use in database applications where more advanced query facilities, such as range queries, are a key requirement. In this paper, we give ...
متن کاملIntegrating RDF Querying Capabilities into a Distributed Search Infrastructure
The Semantic Web is inherently distributed, and covers both metadata and full-text information. Semantic search therefore can profit a lot from peer-to-peer infrastructures as well as from powerful metadata search functionalities based on full-text search technologies. In this paper we focus on an approach extending an existing P2P search infrastructure with RDF querying capabilities, which bot...
متن کاملEfficient Index-based Processing of Join Queries in DHTs
Massively distributed applications require the integration of heterogeneous data from multiple sources. Peer-to-peer (P2P) is one possible network model for these distributed applications and among P2P architectures, distributed hash table (DHT) is well known for its routing performance guarantees. Under a general distributed relational data model, join query operator, an essential component to...
متن کاملBeyond Term Indexing: A P2P Framework for Web Information Retrieval
Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Networks
دوره 54 شماره
صفحات -
تاریخ انتشار 2010